108 research outputs found

    Benchmark datasets for biomedical knowledge graphs with negative statements

    Full text link
    Knowledge graphs represent facts about real-world entities. Most of these facts are defined as positive statements. The negative statements are scarce but highly relevant under the open-world assumption. Furthermore, they have been demonstrated to improve the performance of several applications, namely in the biomedical domain. However, no benchmark dataset supports the evaluation of the methods that consider these negative statements. We present a collection of datasets for three relation prediction tasks - protein-protein interaction prediction, gene-disease association prediction and disease prediction - that aim at circumventing the difficulties in building benchmarks for knowledge graphs with negative statements. These datasets include data from two successful biomedical ontologies, Gene Ontology and Human Phenotype Ontology, enriched with negative statements. We also generate knowledge graph embeddings for each dataset with two popular path-based methods and evaluate the performance in each task. The results show that the negative statements can improve the performance of knowledge graph embeddings

    Explainable Representations for Relation Prediction in Knowledge Graphs

    Full text link
    Knowledge graphs represent real-world entities and their relations in a semantically-rich structure supported by ontologies. Exploring this data with machine learning methods often relies on knowledge graph embeddings, which produce latent representations of entities that preserve structural and local graph neighbourhood properties, but sacrifice explainability. However, in tasks such as link or relation prediction, understanding which specific features better explain a relation is crucial to support complex or critical applications. We propose SEEK, a novel approach for explainable representations to support relation prediction in knowledge graphs. It is based on identifying relevant shared semantic aspects (i.e., subgraphs) between entities and learning representations for each subgraph, producing a multi-faceted and explainable representation. We evaluate SEEK on two real-world highly complex relation prediction tasks: protein-protein interaction prediction and gene-disease association prediction. Our extensive analysis using established benchmarks demonstrates that SEEK achieves significantly better performance than standard learning representation methods while identifying both sufficient and necessary explanations based on shared semantic aspects.Comment: 16 pages, 3 figure

    Ontology Matching Techniques for Enterprise Architecture Models

    Get PDF
    Abstract. Current Enterprise Architecture (EA) approaches tend to be generic, based on broad meta-models that cross-cut distinct architectural domains. Integrating these models is necessary to an effective EA process, in order to support, for example, benchmarking of business processes or assessing compliance to structured requirements. However, the integration of EA models faces challenges stemming from structural and semantic heterogeneities that could be addressed by ontology matching techniques. For that, we used AgreementMakerLight, an ontology matching system, to evaluate a set of state of the art matching approaches that could adequately address some of the heterogeneity issues. We assessed the matching of EA models based on the ArchiMate and BPMN languages, which made possible to conclude about not only the potential but also of the limitations of these techniques to properly explore the more complex semantics present in these models. Enterprise Architecture (EA) is a practice to support the analysis, design and implementation of a business strategy in an organization, considering its relevant multiple domains. In recent years, a variety of Enterprise Architecture To support the matching tasks we have used AgreementMakerLight (AML

    The epidemiology ontology: an ontology for the semantic annotation of epidemiological resources

    Get PDF
    BACKGROUND: Epidemiology is a data-intensive and multi-disciplinary subject, where data integration, curation and sharing are becoming increasingly relevant, given its global context and time constraints. The semantic annotation of epidemiology resources is a cornerstone to effectively support such activities. Although several ontologies cover some of the subdomains of epidemiology, we identified a lack of semantic resources for epidemiology-specific terms. This paper addresses this need by proposing the Epidemiology Ontology (EPO) and by describing its integration with other related ontologies into a semantic enabled platform for sharing epidemiology resources. RESULTS: The EPO follows the OBO Foundry guidelines and uses the Basic Formal Ontology (BFO) as an upper ontology. The first version of EPO models several epidemiology and demography parameters as well as transmission of infection processes, participants and related procedures. It currently has nearly 200 classes and is designed to support the semantic annotation of epidemiology resources and data integration, as well as information retrieval and knowledge discovery activities. CONCLUSIONS: EPO is under active development and is freely available at https://code.google.com/p/epidemiology-ontology/. We believe that the annotation of epidemiology resources with EPO will help researchers to gain a better understanding of global epidemiological events by enhancing data integration and sharing

    Special issue on ontology and linked data matching

    Get PDF
    cheatham2017bEditorial, Semantic web journal 8(2):183-18

    DDB-EDM to FaBiO: The Case of the German Digital Library

    Get PDF
    Cultural heritage portals have the goal of providing users with seamless access to all their resources. This paper introduces initial efforts for a user-oriented restructuring of the German Digital Library (DDB). At present, cultural heritage objects (CHOs) in the DDB are modeled using an extended version of the Europeana Data Model (DDBEDM), which negatively impacts usability and exploration. These challenges can be addressed by exploiting ontologies, and building a knowledge graph from the DDB’s voluminous collection. Towards this goal, an alignment of bibliographic metadata from DDB-EDM to FRBR-Aligned Bibliographic Ontology (FaBiO) is presented

    Results of the Ontology Alignment Evaluation Initiative 2015

    Get PDF
    cheatham2016aInternational audienceOntology matching consists of finding correspondences between semantically related entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. These test cases can use ontologies of different nature (from simple thesauri to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation and consensus. OAEI 2015 offered 8 tracks with 15 test cases followed by 22 participants. Since 2011, the campaign has been using a new evaluation modality which provides more automation to the evaluation. This paper is an overall presentation of the OAEI 2015 campaign

    Metrics for GO based protein semantic similarity: a systematic evaluation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations.</p> <p>Results</p> <p>We conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation.</p> <p>Conclusions</p> <p>This work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid <it>simGIC</it> was the measure with the best overall performance, followed by Resnik's measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.</p

    The immunopeptidome from a genomic perspective:Establishing the noncanonical landscape of MHC class I–associated peptides

    Get PDF
    G.B., D.B., K.W., A.P., R.F., T.R.H., S.K., and J.A.A. received support from Fundacja na rzecz Nauki Polskiej (FNP) (grant ID: MAB/3/2017). D.R.G. received support from Genome Canada & Genome BC (grant ID: 264PRO). D.J.H. received support from NuCana plc (grant ID: SMD0-ZIUN05). H.A. received support from Swedish Cancer Foundation (grant ID: 211709). H.G. received support from United Kingdom Research and Innovation (UKRI) (grant ID: EP/S02431X/1). C.P. received support from Fundação para a CiĂȘncia e a Tecnologia (FCT) through LASIGE Research Unit (grant ID: UIDB/00408/2020 and UIDP/00408/2020). A.L. F.M.Z., C.P., A.R., A.P., and J.A.A. received support from European Union’s Horizon 2020 research and innovation programme (grant ID: 101017453). C.B. received support from Agence Nationale de la Recherche (ANR) through GRAL LabEX (grant ID: ANR-10-LABX-49-01) and CBH-EUR-GS 32 (grant ID: ANR-17-EURE0003). S.N.S. received support from Cancer Research UK (CRUK) and the Chief Scientist's Office of Scotland (CSO): Experimental Cancer Medicine Centre (ECMC) (grant ID: ECMCQQR-2022/100017). A.L. received support from Chief Scientist's Office of Scotland (CSO) NRS Career Researcher Fellowship. R.O.N. received support from CRUK Cambridge Centre Thoracic Cancer Programme (grant ID: CTRQQR-2021\100012).Tumor antigens can emerge through multiple mechanisms, including translation of non-coding genomic regions. This non-canonical category of antigens has recently gained attention; however, our understanding of how they recur within and between cancer types is still in its infancy. Therefore, we developed a proteogenomic pipeline based on deep learning de novo mass spectrometry to enable the discovery of non-canonical MHC-associated peptides (ncMAPs) from non-coding regions. Considering that the emergence of tumor antigens can also involve post-translational modifications, we included an open search component in our pipeline. Leveraging the wealth of mass spectrometry-based immunopeptidomics, we analyzed 26 MHC class I immunopeptidomic studies of 9 different cancer types. We validated the de novo identified ncMAPs, along with the most abundant post-translational modifications, using spectral matching and controlled their false discovery rate (FDR) to 1%. Interestingly, the non-canonical presentation appeared to be 5 times enriched for the A03 HLA supertype, with a projected population coverage of 54.85%. Here, we reveal an atlas of 8,601 ncMAPs with varying levels of cancer selectivity and suggest 17 cancer-selective ncMAPs as attractive targets according to a stringent cutoff. In summary, the combination of the open-source pipeline and the atlas of ncMAPs reported herein could facilitate the identification and screening of ncMAPs as targeting agents for T-cell therapies or vaccine development.Publisher PDFPeer reviewe
    • 

    corecore